pacman::p_load(tmap, sf, dplyr, DT, sp,
stplanr, performance, mapview,
ggpubr, tidyverse, httr,
units, reshape2)Take-home Exercise 2
Setting the Scene
The inquiry focuses on the key motivators prompting city residents to rise early for their daily commutes from home to work, and the consequences of discontinuing public bus services along specific routes. These issues represent significant challenges for transport operators and urban planners.
Traditionally, understanding these dynamics involved conducting extensive commuter surveys. These surveys, however, are expensive, time-intensive, and laborious. Moreover, the data collected often requires extensive processing and analysis, leading to reports that are frequently outdated by the time they are completed.
With the digitalization of urban infrastructure, including public buses, mass rapid transit systems, public utilities, and roads, new opportunities for data collection arise. The integration of pervasive computing technologies like GPS in vehicles and SMART cards among public transport users allows for detailed tracking of movement patterns across time and space.
Despite this, the rapid accumulation of geospatial data has overwhelmed planners’ capacity to effectively analyze and convert it into valuable insights. This inefficiency negatively impacts the return on investment in data collection and management.
Motivation and Objective
The purpose of this take-home project is twofold. First, it addresses the gap in applied research demonstrating the integration, analysis, and modeling of the increasingly available open data for effective policy-making. Despite the abundance of such data, there is a noticeable absence of practical studies showcasing its potential use in policy decisions.
Second, the project aims to fill the void in practical research illustrating the application of geospatial data science and analysis (GDSA) in decision-making processes.
Therefore, the assignment involves conducting a case study to showcase the value of GDSA. This will involve synthesizing publicly accessible data from various sources to construct spatial interaction models. These models will be used to identify and analyze factors influencing the urban mobility patterns of public bus transit.
1.Getting Started
The purpose of pacman is to streamline package management in R. One of its key functions, p_load, is used to install (if not already installed) and then load the specified R packages.
Inside the p_load function, several packages are listed:
tmap: Used for creating thematic maps, which are useful in geospatial analysis.sf: Used for handling and analyzing spatial data.dplyr: Part of the tidyverse, this package provides functions for data manipulation and transformation.DT: Used for interactive display of data tables in R.stplanr: A package tailored for sustainable transport planning, including functions for analyzing spatial lines, networks, etc.performance: Useful for assessing and checking the performance of statistical models.mapview: Facilitates interactive viewing of spatial data in R.ggpubr: Provides functions to create “ggplot2”-based publication ready plots.tidyverse: A collection of R packages designed for data science, including data manipulation, exploration, and visualization.
2.Data Importing
2.1Geospatial Data Importing
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
mpszSimple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N
1 MARINA EAST MESZ01 MARINA EAST ME CENTRAL REGION
2 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
3 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
4 JURONG ISLAND AND BUKOM WISZ01 WESTERN ISLANDS WI WEST REGION
5 FORT CANNING MUSZ02 MUSEUM MU CENTRAL REGION
6 MARINA EAST (MP) MPSZ05 MARINE PARADE MP CENTRAL REGION
7 SUDONG WISZ03 WESTERN ISLANDS WI WEST REGION
8 SEMAKAU WISZ02 WESTERN ISLANDS WI WEST REGION
9 SOUTHERN GROUP SISZ02 SOUTHERN ISLANDS SI CENTRAL REGION
10 SENTOSA SISZ01 SOUTHERN ISLANDS SI CENTRAL REGION
REGION_C geometry
1 CR MULTIPOLYGON (((33222.98 29...
2 CR MULTIPOLYGON (((28481.45 30...
3 CR MULTIPOLYGON (((28087.34 30...
4 WR MULTIPOLYGON (((14557.7 304...
5 CR MULTIPOLYGON (((29542.53 31...
6 CR MULTIPOLYGON (((35279.55 30...
7 WR MULTIPOLYGON (((15772.59 21...
8 WR MULTIPOLYGON (((19843.41 21...
9 CR MULTIPOLYGON (((30870.53 22...
10 CR MULTIPOLYGON (((26879.04 26...
mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")2.2Aspatial Data Importing
odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")glimpse(odbus)Rows: 5,694,297
Columns: 7
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 2.3Extracting the Study Data
weekday6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekday6_9)We will save the output in rds format for future used.
write_rds(weekday6_9, "data/rds/weekday6_9.rds")The code chunk below will be used to import the save weekday6_9.rds into R environment.
weekday6_9 <- read_rds("data/rds/weekday6_9.rds")weekday17_20 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 &
TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekday17_20)write_rds(weekday17_20, "data/rds/weekday17_20.rds")weekday17_20 <- read_rds("data/rds/weekday17_20.rds")weekend11_14 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 &
TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekend11_14)write_rds(weekend11_14, "data/rds/weekend11_14.rds")weekend11_14 <- read_rds("data/rds/weekend11_14.rds")weekend16_19 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 &
TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekend16_19)write_rds(weekend16_19, "data/rds/weekend16_19.rds")weekend16_19 <- read_rds("data/rds/weekend16_19.rds")4.Geospatial Data Wrangling
4.1Combining busstop and mpsz
busstop_mpsz <- st_intersection(busstop, mpsz) %>%
select(BUS_STOP_N, SUBZONE_C)mapview(busstop_mpsz)4.2Creating Hexagon Layer
Create hexagonal grid based on the extent of mpsz dataset
hex_grid <- st_make_grid(mpsz, cellsize = 2 * 375 / sqrt(3), square = FALSE)Convert the grid to an sf object and add an ID for each hexagon
hex_grid_sf <- st_sf(geometry = hex_grid) %>%
mutate(hex_id = 1:length(hex_grid))Count the number of bus stops in each hexagon
busstop_counts <- st_intersects(hex_grid_sf, busstop_mpsz, sparse = FALSE) %>%
apply(1, function(x) sum(x, na.rm = TRUE))Add bus stop count to hex grid sf object
hex_grid_sf$bus_stop_count <- busstop_countsFilter out hexagons with no bus stops
hex_grid_sf <- hex_grid_sf %>%
filter(bus_stop_count > 0)tmap_options(check.and.fix = TRUE)
map_hexagon <- tm_shape(hex_grid_sf) +
tm_polygons(
col = "bus_stop_count",
palette = "Purples",
style = "cont",
title = "Number of Bus Stops",
id = "hex_id",
showNA = FALSE,
alpha = 0.6,
popup.vars = c("Number of Bus Stops" = "bus_stop_count"),
popup.format = list(bus_stop_count = list(format = "f", digits = 0))
) +
tm_shape(mpsz) +
tm_borders(col = "grey75", lwd = 0.7) +
tm_layout(
main.title = "Bus Stop Distribution in Singapore",
main.title.size = 1.5,
legend.title.size = 1,
legend.text.size = 0.8,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Data Source: LTA DataMall, Data.gov.sg", position = c("RIGHT", "BOTTOM"), size = 0.8) +
tm_view(view.legend.position = c("left", "bottom"))
map_hexagon
Next, we are going to append the planning subzone code from busstop_mpsz data frame onto odbus6_9 data frame.
busstop_hex <- st_intersection(busstop, hex_grid_sf) %>%
select(BUS_STOP_N, hex_id) %>%
st_drop_geometry()busstop_hex <- na.omit(busstop_hex)head(busstop_hex) BUS_STOP_N hex_id
3269 25059 393
254 26379 444
2570 25751 488
4203 26389 490
2403 26369 491
2897 25761 535
write_rds(busstop_hex, "data/rds/busstop_hex.rds") rm(hex_grid, map_hexagon, odbus, busstop_counts)5.Preparing Commute Flow Data
5.1Weekday Morning Peak
weekday_morning_od <- left_join(weekday6_9 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)Before continue, it is a good practice for us to check for duplicating records.
duplicate <- weekday_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()If duplicated records are found, the code chunk below will be used to retain the unique records.
weekday_morning_od <- unique(weekday_morning_od)weekday_morning_od <- left_join(weekday_morning_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekday_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekday_morning_od <- unique(weekday_morning_od)weekday_morning_od <- weekday_morning_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKDAY_MORNING_PEAK = sum(TRIPS))It is time to save the output into an rds file format.
write_rds(weekday_morning_od, "data/rds/weekday_morning_od.rds")weekday_morning_od <- read_rds("data/rds/weekday_morning_od.rds")5.2Weekday Afternoon Peak
weekday_afternoon_od <- left_join(weekday17_20 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)duplicate <- weekday_afternoon_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekday_afternoon_od <- unique(weekday_afternoon_od)weekday_afternoon_od <- left_join(weekday_afternoon_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekday_afternoon_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekday_afternoon_od <- unique(weekday_afternoon_od)weekday_afternoon_od <- weekday_afternoon_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKDAY_AFTERNOON_PEAK = sum(TRIPS))write_rds(weekday_afternoon_od, "data/rds/weekday_afternoon_od.rds")weekday_afternoon_od <- read_rds("data/rds/weekday_afternoon_od.rds")5.3Weekend Morning Peak
weekend_morning_od <- left_join(weekend11_14 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)duplicate <- weekend_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_morning_od <- unique(weekend_morning_od)weekend_morning_od <- left_join(weekend_morning_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekend_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_morning_od <- unique(weekend_morning_od)weekend_morning_od <- weekend_morning_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKEND_MORNING_PEAK = sum(TRIPS))write_rds(weekend_morning_od, "data/rds/weekend_morning_od.rds")weekend_morning_od <- read_rds("data/rds/weekend_morning_od.rds")5.4Weekend Evening Peak
weekend_evening_od <- left_join(weekend16_19 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)duplicate <- weekend_evening_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_evening_od <- unique(weekend_evening_od)weekend_evening_od <- left_join(weekend_evening_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekend_evening_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_evening_od <- unique(weekend_evening_od)weekend_evening_od <- weekend_evening_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKEND_EVENING_PEAK = sum(TRIPS))write_rds(weekend_evening_od, "data/rds/weekend_evening_od.rds")weekend_evening_od <- read_rds("data/rds/weekend_evening_od.rds")rm(weekday6_9, weekday17_20, weekend11_14, weekend16_19)6.Visualising Spatial Interaction
6.1Removing Intra-zonal Flows
weekday_morning_od1 <- weekday_morning_od[weekday_morning_od$ORIGIN_HEX!=weekday_morning_od$DESTIN_HEX,]weekday_afternoon_od1 <- weekday_afternoon_od[weekday_afternoon_od$ORIGIN_HEX!=weekday_afternoon_od$DESTIN_HEX,]weekend_morning_od1 <- weekend_morning_od[weekend_morning_od$ORIGIN_HEX!=weekend_morning_od$DESTIN_HEX,]weekend_evening_od1 <- weekend_evening_od[weekend_evening_od$ORIGIN_HEX!=weekend_evening_od$DESTIN_HEX,]6.2Creating the Desire Lines
weekday_morning_flowLine <- od2line(flow = weekday_morning_od1,
zones = hex_grid_sf,
zone_code = "hex_id")weekday_afternoon_flowLine <- od2line(flow = weekday_afternoon_od1,
zones = hex_grid_sf,
zone_code = "hex_id")weekend_morning_flowLine <- od2line(flow = weekend_morning_od1,
zones = hex_grid_sf,
zone_code = "hex_id")weekend_evening_flowLine <- od2line(flow = weekend_evening_od1,
zones = hex_grid_sf,
zone_code = "hex_id")6.3Visualising the Desire Lines
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekday_morning_flowLine %>%
filter(WEEKDAY_MORNING_PEAK >= 5000)) +
tm_lines(
lwd = "WEEKDAY_MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekday Morning Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekday_afternoon_flowLine %>%
filter(WEEKDAY_AFTERNOON_PEAK >= 5000)) +
tm_lines(
lwd = "WEEKDAY_AFTERNOON_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekday Afternoon Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekend_morning_flowLine %>%
filter(WEEKEND_MORNING_PEAK >= 3000)) +
tm_lines(
lwd = "WEEKEND_MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 13, 15),
n = 7,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekend Morning Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekend_evening_flowLine %>%
filter(WEEKEND_EVENING_PEAK >= 3000)) +
tm_lines(
lwd = "WEEKEND_EVENING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekend Evening Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
rm(duplicate, weekday_morning_flowLine, weekday_morning_od, weekday_morning_od1,
weekend_morning_flowLine, weekend_morning_od, weekend_morning_od1,
weekend_evening_flowLine, weekend_evening_od, weekend_evening_od1)7.Attractiveness Factors
7.1Data Integration
entertn <- st_read(dsn = "data/geospatial",
layer = "entertn")Reading layer `entertn' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$entertn_count <- lengths(st_intersects(hex_grid_sf, entertn))FB <- st_read(dsn = "data/geospatial",
layer = "F&B")Reading layer `F&B' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$FB_count <- lengths(st_intersects(hex_grid_sf, FB))lere <- st_read(dsn = "data/geospatial",
layer = "Liesure&Recreation")Reading layer `Liesure&Recreation' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$lere_count <- lengths(st_intersects(hex_grid_sf, lere))retail <- st_read(dsn = "data/geospatial",
layer = "Retails")Reading layer `Retails' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$retail_count <- lengths(st_intersects(hex_grid_sf, retail))trainexits <- st_read(dsn = "data/geospatial",
layer = "Train_Station_Exit_Layer")Reading layer `Train_Station_Exit_Layer' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 565 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6134.086 ymin: 27499.7 xmax: 45356.36 ymax: 47865.92
Projected CRS: SVY21
trainexits <- st_transform(trainexits, st_crs(hex_grid_sf))
hex_grid_sf$trainexits_count <- lengths(st_intersects(hex_grid_sf, trainexits))hdb <- read_csv("data/aspatial/hdb.csv")residential <- hdb %>%
filter(residential == "Y") %>%
select(lat, lng, total_dwelling_units) %>%
na.omit()residential <- st_as_sf(residential,
coords = c("lng", "lat"),
crs = 4326) %>%
st_transform(crs = st_crs(hex_grid_sf))intersections <- st_intersects(hex_grid_sf, residential, sparse = TRUE)
hex_grid_sf$residential_count <- mapply(function(index, residential) {
sum(residential$total_dwelling_units[index], na.rm = TRUE)
}, intersections, MoreArgs = list(residential = residential))rm(intersections)8.Propulsive Factors
8.1Data Integration
business <- st_read(dsn = "data/geospatial",
layer = "Business")Reading layer `Business' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$business_count <- lengths(st_intersects(hex_grid_sf, business))url<-"https://www.onemap.gov.sg/api/common/elastic/search"
csv<-read_csv("data/aspatial/Generalinformationofschools.csv")
postcodes<-csv$`postal_code`
found<-data.frame()
not_found<-data.frame()
for(postcode in postcodes){
query<-list('searchVal'=postcode,'returnGeom'='Y','getAddrDetails'='Y','pageNum'='1')
res<- GET(url,query=query)
if((content(res)$found)!=0){
found<-rbind(found,data.frame(content(res))[4:13])
} else{
not_found = data.frame(postcode)
}
}merged = merge(csv, found, by.x = 'postal_code', by.y = 'results.POSTAL', all = TRUE)
write.csv(merged, file = "data/aspatial/schools.csv")
write.csv(not_found, file = "data/aspatial/not_found.csv")schools <- read_csv("data/aspatial/schools.csv") %>%
rename(latitude = "results.LATITUDE",
longitude = "results.LONGITUDE")%>%
select(postal_code, school_name, latitude, longitude)schools <- schools %>%
filter(!is.na(longitude) & !is.na(latitude)) %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(crs = 3414)hex_grid_sf$school_count <- lengths(st_intersects(hex_grid_sf, schools))rm(csv, found, merged, not_found, query, res,
postcode, postcodes, url)finserv <- st_read(dsn = "data/geospatial",
layer = "FinServ")Reading layer `FinServ' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 3320 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4881.527 ymin: 25171.88 xmax: 46526.16 ymax: 49338.02
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$finserv_count <- lengths(st_intersects(hex_grid_sf, finserv))8.2Final Check
datatable(hex_grid_sf)9.Computing Distance Matrix
9.1Converting from sf data.table to SpatialPolygonsDataFrame
hex_grid_sp <- as(hex_grid_sf, "Spatial")
hex_grid_spclass : SpatialPolygonsDataFrame
features : 1853
extent : 3966.576, 48566.88, 26373.72, 50123.72 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 11
names : hex_id, bus_stop_count, entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count, business_count, school_count, finserv_count
min values : 393, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0
max values : 9988, 13, 8, 71, 26, 986, 9, 3531, 60, 4, 91
9.2 Computing the distance matrix
dist <- spDists(hex_grid_sp,
longlat = FALSE)
head(dist, n=c(10, 10)) [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.0000 2633.9134 866.0254 2291.2878 3031.0889 750.0000 1984.3135
[2,] 2633.9134 0.0000 1887.4586 433.0127 433.0127 2291.2878 866.0254
[3,] 866.0254 1887.4586 0.0000 1500.0000 2250.0000 433.0127 1145.6439
[4,] 2291.2878 433.0127 1500.0000 0.0000 750.0000 1887.4586 433.0127
[5,] 3031.0889 433.0127 2250.0000 750.0000 0.0000 2633.9134 1145.6439
[6,] 750.0000 2291.2878 433.0127 1887.4586 2633.9134 0.0000 1500.0000
[7,] 1984.3135 866.0254 1145.6439 433.0127 1145.6439 1500.0000 0.0000
[8,] 4175.8233 1561.2495 3381.9373 1887.4586 1145.6439 3750.0000 2250.0000
[9,] 1732.0508 1299.0381 866.0254 866.0254 1561.2495 1145.6439 433.0127
[10,] 2410.9127 750.0000 1561.2495 433.0127 866.0254 1887.4586 433.0127
[,8] [,9] [,10]
[1,] 4175.823 1732.0508 2410.9127
[2,] 1561.249 1299.0381 750.0000
[3,] 3381.937 866.0254 1561.2495
[4,] 1887.459 866.0254 433.0127
[5,] 1145.644 1561.2495 866.0254
[6,] 3750.000 1145.6439 1887.4586
[7,] 2250.000 433.0127 433.0127
[8,] 0.000 2633.9134 1887.4586
[9,] 2633.913 0.0000 750.0000
[10,] 1887.459 750.0000 0.0000
9.3Labeling Column and Row Headers of a Distance Matrix
hex_names <- hex_grid_sf$hex_idcolnames(dist) <- paste0(hex_names)
rownames(dist) <- paste0(hex_names)9.4Pivoting Distance Value by HEX_ID
distPair <- melt(dist) %>%
rename(dist = value)
head(distPair, 10) Var1 Var2 dist
1 393 393 0.0000
2 444 393 2633.9134
3 488 393 866.0254
4 490 393 2291.2878
5 491 393 3031.0889
6 535 393 750.0000
7 537 393 1984.3135
8 540 393 4175.8233
9 583 393 1732.0508
10 584 393 2410.9127
9.5Updating Intra-zonal Distances
distPair %>%
filter(dist > 0) %>%
summary() Var1 Var2 dist
Min. : 393 Min. : 393 Min. : 433
1st Qu.:3694 1st Qu.:3694 1st Qu.: 7949
Median :5474 Median :5474 Median :12933
Mean :5236 Mean :5236 Mean :13705
3rd Qu.:6837 3rd Qu.:6837 3rd Qu.:18407
Max. :9988 Max. :9988 Max. :44478
distPair$dist <- ifelse(distPair$dist == 0,
200, distPair$dist)distPair %>%
summary() Var1 Var2 dist
Min. : 393 Min. : 393 Min. : 200
1st Qu.:3694 1st Qu.:3694 1st Qu.: 7949
Median :5474 Median :5474 Median :12933
Mean :5236 Mean :5236 Mean :13698
3rd Qu.:6837 3rd Qu.:6837 3rd Qu.:18407
Max. :9988 Max. :9988 Max. :44478
distPair <- distPair %>%
rename(orig = Var1,
dest = Var2)write_rds(distPair, "data/rds/distPair.rds") 10.Combining Passenger Volume Data with Distance Value
weekday_afternoon_od1$ORIGIN_HEX <- as.factor(weekday_afternoon_od1$ORIGIN_HEX)
weekday_afternoon_od1$DESTIN_HEX <- as.factor(weekday_afternoon_od1$DESTIN_HEX)
distPair$orig <- as.factor(distPair$orig)
distPair$dest <- as.factor(distPair$dest)weekday_afternoon_od2 <- weekday_afternoon_od1 %>%
left_join (distPair,
by = c("ORIGIN_HEX" = "orig",
"DESTIN_HEX" = "dest"))hex_grid_df <- as.data.frame(hex_grid_sf) %>%
select(hex_id, bus_stop_count, business_count, school_count, finserv_count,
entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count) %>%
mutate(hex_id = as.character(hex_id))origin_factors <- hex_grid_df %>%
select(hex_id, bus_stop_count, business_count, school_count, finserv_count)
weekday_afternoon_od2 <- weekday_afternoon_od2 %>%
mutate(ORIGIN_HEX = as.character(ORIGIN_HEX),
DESTIN_HEX = as.character(DESTIN_HEX))weekday_afternoon_od2_with_origin <- weekday_afternoon_od2 %>%
left_join(origin_factors, by = c("ORIGIN_HEX" = "hex_id"))destin_factors <- hex_grid_df %>%
select(hex_id, entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count)weekday_afternoon_od2_complete <- weekday_afternoon_od2_with_origin %>%
left_join(destin_factors, by = c("DESTIN_HEX" = "hex_id"))glimpse(weekday_afternoon_od2_complete)Rows: 161,671
Columns: 14
Groups: ORIGIN_HEX [1,808]
$ ORIGIN_HEX <chr> "393", "393", "393", "393", "393", "393", "393"…
$ DESTIN_HEX <chr> "535", "585", "723", "770", "779", "824", "827"…
$ WEEKDAY_AFTERNOON_PEAK <dbl> 3, 34, 182, 1, 1, 18, 6, 145, 2, 16, 250, 23, 2…
$ dist <dbl> 750.000, 3122.499, 1561.249, 1887.459, 7697.402…
$ bus_stop_count <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ business_count <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0,…
$ school_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ finserv_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ entertn_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ FB_count <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ lere_count <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ retail_count <int> 0, 1, 0, 0, 0, 0, 3, 0, 2, 0, 3, 9, 2, 1, 0, 0,…
$ trainexits_count <int> 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0,…
$ residential_count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
write_rds(weekday_afternoon_od2_complete, "data/rds/SIM_data.rds")rm(business, busstop, busstop_hex, busstop_mpsz, destin_factors, dist, distPair, entertn, FB, finserv, hdb, hex_grid_df, hex_grid_sf, hex_grid_sp, lere, mpsz, origin_factors, residential, retail, schools, trainexits, weekday_afternoon_flowLine, weekday_afternoon_od, weekday_afternoon_od1, weekday_afternoon_od2, weekday_afternoon_od2_complete, weekday_afternoon_od2_with_origin, hex_names)11.Calibrating Spatial Interaction Models
Importing the modelling data
SIM_data <- read_rds("data/rds/SIM_data.rds")Visualising the dependent variable
ggplot(data = SIM_data,
aes(x = WEEKDAY_AFTERNOON_PEAK)) +
geom_histogram()
ggplot(data = SIM_data,
aes(x = dist,
y = WEEKDAY_AFTERNOON_PEAK)) +
geom_point() +
geom_smooth(method = lm)
ggplot(data = SIM_data,
aes(x = log(dist),
y = log(WEEKDAY_AFTERNOON_PEAK))) +
geom_point() +
geom_smooth(method = lm)
Checking for Variables with Zero Values
summary(SIM_data) ORIGIN_HEX DESTIN_HEX WEEKDAY_AFTERNOON_PEAK dist
Length:161671 Length:161671 Min. : 1.0 Min. : 433
Class :character Class :character 1st Qu.: 4.0 1st Qu.: 2411
Mode :character Mode :character Median : 18.0 Median : 4684
Mean : 148.4 Mean : 5658
3rd Qu.: 73.0 3rd Qu.: 8020
Max. :59391.0 Max. :25159
bus_stop_count business_count school_count finserv_count
Min. : 1.000 Min. : 0.000 Min. :0.0000 Min. : 0.000
1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.000
Median : 3.000 Median : 0.000 Median :0.0000 Median : 1.000
Mean : 3.401 Mean : 2.556 Mean :0.1795 Mean : 3.944
3rd Qu.: 4.000 3rd Qu.: 2.000 3rd Qu.:0.0000 3rd Qu.: 4.000
Max. :13.000 Max. :60.000 Max. :4.0000 Max. :91.000
entertn_count FB_count lere_count retail_count
Min. :0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.00
1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 2.00
Median :0.0000 Median : 0.000 Median : 0.0000 Median : 7.00
Mean :0.1352 Mean : 2.135 Mean : 0.8934 Mean : 38.84
3rd Qu.:0.0000 3rd Qu.: 1.000 3rd Qu.: 1.0000 3rd Qu.: 31.00
Max. :8.0000 Max. :71.000 Max. :26.0000 Max. :986.00
trainexits_count residential_count
Min. :0.0000 Min. : 0.0
1st Qu.:0.0000 1st Qu.: 0.0
Median :0.0000 Median : 219.0
Mean :0.6436 Mean : 674.6
3rd Qu.:0.0000 3rd Qu.:1264.0
Max. :9.0000 Max. :3531.0
SIM_data$business_count <- ifelse(
SIM_data$business_count == 0,
0.99, SIM_data$business_count)
SIM_data$school_count <- ifelse(
SIM_data$school_count == 0,
0.99, SIM_data$school_count)
SIM_data$finserv_count <- ifelse(
SIM_data$finserv_count == 0,
0.99, SIM_data$finserv_count)
SIM_data$entertn_count <- ifelse(
SIM_data$entertn_count == 0,
0.99, SIM_data$entertn_count)
SIM_data$FB_count <- ifelse(
SIM_data$FB_count == 0,
0.99, SIM_data$FB_count)
SIM_data$lere_count <- ifelse(
SIM_data$lere_count == 0,
0.99, SIM_data$lere_count)
SIM_data$retail_count <- ifelse(
SIM_data$retail_count == 0,
0.99, SIM_data$retail_count)
SIM_data$trainexits_count <- ifelse(
SIM_data$trainexits_count == 0,
0.99, SIM_data$trainexits_count)
SIM_data$residential_count <- ifelse(
SIM_data$residential_count == 0,
0.99, SIM_data$residential_count)summary(SIM_data) ORIGIN_HEX DESTIN_HEX WEEKDAY_AFTERNOON_PEAK dist
Length:161671 Length:161671 Min. : 1.0 Min. : 433
Class :character Class :character 1st Qu.: 4.0 1st Qu.: 2411
Mode :character Mode :character Median : 18.0 Median : 4684
Mean : 148.4 Mean : 5658
3rd Qu.: 73.0 3rd Qu.: 8020
Max. :59391.0 Max. :25159
bus_stop_count business_count school_count finserv_count
Min. : 1.000 Min. : 0.990 Min. :0.990 Min. : 0.990
1st Qu.: 2.000 1st Qu.: 0.990 1st Qu.:0.990 1st Qu.: 0.990
Median : 3.000 Median : 0.990 Median :0.990 Median : 1.000
Mean : 3.401 Mean : 3.138 Mean :1.018 Mean : 4.391
3rd Qu.: 4.000 3rd Qu.: 2.000 3rd Qu.:0.990 3rd Qu.: 4.000
Max. :13.000 Max. :60.000 Max. :4.000 Max. :91.000
entertn_count FB_count lere_count retail_count
Min. :0.990 Min. : 0.990 Min. : 0.990 Min. : 0.99
1st Qu.:0.990 1st Qu.: 0.990 1st Qu.: 0.990 1st Qu.: 2.00
Median :0.990 Median : 0.990 Median : 0.990 Median : 7.00
Mean :1.045 Mean : 2.862 Mean : 1.539 Mean : 38.99
3rd Qu.:0.990 3rd Qu.: 1.000 3rd Qu.: 1.000 3rd Qu.: 31.00
Max. :8.000 Max. :71.000 Max. :26.000 Max. :986.00
trainexits_count residential_count
Min. :0.990 Min. : 0.99
1st Qu.:0.990 1st Qu.: 0.99
Median :0.990 Median : 219.00
Mean :1.399 Mean : 675.02
3rd Qu.:0.990 3rd Qu.:1264.00
Max. :9.000 Max. :3531.00
Unconstrained Spatial Interaction Model
uncSIM <- glm(formula = WEEKDAY_AFTERNOON_PEAK ~
log(bus_stop_count) +
log(business_count) +
log(school_count) +
log(finserv_count) +
log(entertn_count) +
log(FB_count) +
log(lere_count) +
log(retail_count) +
log(trainexits_count) +
log(residential_count) +
log(dist),
family = poisson(link = "log"),
data = SIM_data,
na.action = na.exclude)
uncSIM
Call: glm(formula = WEEKDAY_AFTERNOON_PEAK ~ log(bus_stop_count) +
log(business_count) + log(school_count) + log(finserv_count) +
log(entertn_count) + log(FB_count) + log(lere_count) + log(retail_count) +
log(trainexits_count) + log(residential_count) + log(dist),
family = poisson(link = "log"), data = SIM_data, na.action = na.exclude)
Coefficients:
(Intercept) log(bus_stop_count) log(business_count)
11.67751 0.39921 -0.06785
log(school_count) log(finserv_count) log(entertn_count)
-0.30247 0.42730 0.05798
log(FB_count) log(lere_count) log(retail_count)
-0.18000 -0.10078 0.05821
log(trainexits_count) log(residential_count) log(dist)
0.69354 0.12207 -1.04914
Degrees of Freedom: 161670 Total (i.e. Null); 161659 Residual
Null Deviance: 96290000
Residual Deviance: 59350000 AIC: 60130000
R-squared Function
CalcRSquared <- function(observed,estimated){
r <- cor(observed,estimated)
R2 <- r^2
R2
}CalcRSquared(uncSIM$data$WEEKDAY_AFTERNOON_PEAK, uncSIM$fitted.values)[1] 0.1063211
r2_mcfadden(uncSIM)# R2 for Generalized Linear Regression
R2: 0.381
adj. R2: 0.381